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TRAINING  SET 


l^u>e-2lalalt>  l.gure-2.3a 


DATABASE: 

120,000  DIGITS  FROM  NIST 
2000  FOR  TESTING 
10X10  INPUT  SPACE 
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SINGLE  100-10-10  NETWORK 


ligure-4a  tall  lig-5  lali 


TRAINED 


ONIlSOOa  IVNIOIUO 


f«g-6  taih  figure-7a.tal( 


TRAINING  SET  SIZE 


figure-Saiallc 


ORIGINAL  BOOSTED  ALGORITHM 
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100-10-10  NETWORK 
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A  Commercial  Application: 
Extracting  Document  Content 
from  Images 


Christopher  L.  Scofield 
Harry  Chang 
Ed  Collins 
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Structure  of  the  Problem 


•  OCR  is  not  really  a  problem  of  character 
recognition.  It  is  really  language  processing 
from  images: 

Character  context  drives  segmentation: 

5311  SJS 


Lexical  context  drives  character  interpretation: 

iBvoH  Bvo'E. 

K^Nestor,  Inc. _ 


structure  of  the  Problem 


Lexical  context  r**’:''es  character  interpretation: 


cVo 


clone? 

done? 


Application  specific  rules  drive  interpretation: 


y  q  y  2  ^ 


54Y293? 

54Y2A3? 

J4Y2A3? 
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Structure  of  the  Problem 


Document  structure  drives  syntactic  and  lexical  possibilitie 

\f  ^  O  0  L  dJl^) 

2-^3^!  LjO 

Company  Name 

Street  number  Street  name 

City  State  Zip 
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What  are  the  possible 
approaches? 

Single  Network  Architecture 

»  [KeelerSO]:  Combined  segmentation  and  recognition; 
»  [Fontaine92]:  RNN  trained  on  pixel-columns 

-  Pluses: 

»  Makes  no  assumption  about  structure  of  problem 
»  Automatically  trains  each  part  of  problem 

-  Minuses: 

»  Scaling  problem 

»  Lack  of  modularity:  application  dependent 
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Possible  methods 


Multiple  Network  Architecture 

»  [Gouin92]:  Neural  network  segmentation  for  map 
processing; 

»  [Scofield92]:  Context-driven  segmentation, 
recognition 

-  Pluses: 

»  Each  module  can  be  built  in  a  minimal  fashion 

»  Only  some  parts  need  to  be  changed  for  new 
applications 

-  Minuses: 

»  Assumes  prior  knowledge 

»  Must  be  assembled  in  a  piecewise  fashion 

»  Credit  assignment  problem 


[f^Nestor,  Inc.. 


A  Multiple  Network  Approach  to 
OCR  for  Handwriting 

Task  decomposition: 


Neural  Network  Segmentation 

•  Neural  network  is  used  to  assemble  a  tree 
representation  of  the  image: 

(1)  Classify  all  blobs  into  “Character”,  “Noise”,  and  “Mixed" 


(2)  Recursively  segment  “Mixed”  until  decomposed  into 
only  terminal  nodes  “Character”  and  “Noise” 

(3)  Compose  a  list  of  possible  alternative  segmentations 

Fragmented  characters 
Optimal  window  adjustment 


V 
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Segmentation:  Step  1 


3-layer  BPN  trained  to  classify  blobs  into: 


‘Mixed” 


"Character”  “Noise” 

L-'V 


Use  connectivity  analysis  features  [Hu62] 
including: 

area,  perimeter,  number  of  holes, 
area  of  holes,  principle  moments, 
aspect  ratio,  etc. 
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Segmentation:  Step  2 


•  “C”,  “N”  are  terminal, 

“M”  parsed  with  quad¬ 
tree  analysis  [SametSO] 

Re-classified  at  each 
step: 
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Segmentation:  Step  2  (contd): 

-  Hierarchical  Agglomerative  Clustering  [Duda72]  groups 
terminal  nodes;  re-classified  to  ensure  still  terminal: 

Step  3: 

•  List  of  segmentation  aiternatives  compiied  for 
ciassification  into  characters 

-  Multiple  “cuts”  of  characters  provided  for  later  analysis: 

[f^Nestor,  Inc. - 


Segmentation  Accuracy 

•  Test  set  consists  of  5,654  HP/MP  characters 
in  1,236  words  (46%  HP)  selected  from  53 
real-world  documents 

•  HP  data  consists  of  live  forms  with 
constrained  HP,  unconstrained  HP,  run-on 
HP  and  some  cursive 


•  Character  segmentation  accuracy: 

(First  choice  correctly  segmented) 
Segmentation  Network 
“Blob”  features  (7-10-3) 

K^NestoVf  Inc. _ 


Correct  Incorrect 
92.7  7.3 


Character  Recognition: 
Overview 


•  Segmentation  alternatives  are  processed  for 
character  class 

•  Use  two  static  feature  sets:  (cf  -  [LeCunSO]) 

•  Three-layer,  feedforward  BPNs  are  used  as 
estimators  of  a  posteriori  probabilities 

•  We  have  employed  three  types  of  hybrid 
networks: 

-  Giue  networks  [Waibei88] 

-  Paraiiei  Experts  [Reiliy87] 

-  Hierarchical  Filters  [Rellly87] 
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Character  Recognition: 
Feature  Extraction 

•  Segmentation  alternatives  are  converted  to 
grey-level:  gaussian  kernel  estimated  from  line 
widths 

•  Pixel  Feature  Set: 

-  Pixel  map  Is  sub-sampled  with  grid  producing  coarse 
map  (100-element  grid) 

•  Edge  Feature  Set: 

-  Edge-map  produced  from  grey-level  gradient  estimation 
[RobertsGS]  (4  edge  directions) 

-  Edge  map  is  sub-sampled  with  grid  producing  coarse 
edge  map  (30-element  grid) 

[f^Nestor,  Inc. _ 


Character  Recognition  Ciassifiers 

•  Features  used  to  train  3-layer,  feedforward  BPNs 

Data  Set  #  Authors  Digits  Aloha  U/LC 

Train:  NIST  1,3;  Propr.  2600  265,000  120,000 

Test:  Propr.  4,767  12,932 

•  In  addition  to  using  “Forced  accuracy”,  can  use  a 
heuristic  which  models  high  cost  of  errors: 

“Figure  of  Merit”:  FM  =  100  -  10(%E)  -  %R 


Numeric  Network 
Edge  features  (120<32-10) 
Pixel  features  (100-45-10) 


FM  Correct  Inc.  Reject  Forced 

95.15  97.04  0.21  2.75  99.01 

93.41  95.87  0.27  3.86  98.59 
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Classifier  Analysis 

Using  “rule-of-thumb”  e  =  W/T  [Baum89]: 

Training  set:  T  =  265,000 

Edge  Net:  W  =  1 20*32+32*1 0  =  4,1 60 

Expected  test  error:  e  =  1.7% 

Pixel  Net:  W  =  1 00*45+45*1 0  =  4,950 

Expected  test  error:  e  =  1.9% 


o, 
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Character  Recognition: 
Parallel  Experts 

•  How  to  combine  the  results  from  two  networks? 

•  Could  vote  if  have  many  “experts’.  If  only  two,  then 
average  activation  (probability)  vectors: 


Pi  =  1/2(P®,  +  PP|) 


■R)^  - 

IFMU 

— 

i 

Numeric  Network 

EM 

Correct  Inc. 

Reject  Forced 

Edge  features  (120-32-10) 

95.15 

97.04 

0.21 

2.75 

99.01 

Pixel  features  (100-45-10) 

93.41 

95.87 

0.27 

3.86 

98.59 

Parallel  Nets: 

97.25 

98.20 

0.10 

1.70 

99.39 
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Alphanumeric  Character  Recognition 

•  Support  full  alphanumeric  HP 

•  Natural  decomposition  into  alpha  and 
numeric  subnets 

•  Use  glue-net  architecture  [WaibeiSS]: 

-  Trained  3-layer  nets  for  alpha  (u/l  case)  and  numeric 

-  Freeze  middle-layer  weights,  route  activations  to  output 

-  Add-In  new  “glue”  layer  to  resolve  inter-class  ambiguity 

-  Train  second  layer  of  weights  and  ail  glue-cell  weights 

alpha  numeric  alpha  numeric  alphanumeric 


V 


glue” 
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Alphanumeric  Accuracy 


Results: 

Network  Architecture 
Numeric  sub-net  (120-32-10) 
Alpha  (U/L)  sub-net  (120-120-26) 
Single  Glue  Net(1)  (120-210-36) 

Hierarchical  (Super)  Glue  Net 
Single  Glue  Net(2)  (120-210-36) 
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Forced 


EM 

Correct  Incorrect 

n/a 

95.15 

4.75 

nia 

91.41 

8.59 

51.24 

89.24 

10.76 

54.04 

90.80 

9.20 

54.46 

89.98 

10.02 

alphanumeric 


Pixel  features 
alphanumeric 


Edge  features 


Glue  Net  Analysis 


Digit  set: 

Digit  Net: 

Expected  test  error: 
Alpha  set: 

Alpha  sub-net: 
Expected  test  error: 
Full  set: 

Glue  weights: 
Expected  test  error: 


265,000 

Ws  120*32+32*10  =  4,160 
e  =  1.7% 

120,000 

W  =  120*120  +  120*26  =  18,720 
e  =  15.6% 

385,000 

W  =  120*58  +  210*36  =  14,520 
e  =  3.8% 


n, 
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Character  and  Lexical  Context 


Word  recognition:  determine  the  best  string 
interpretation  given  all  sources  of  knowledge: 

segmentation  alternatives 
character  recognition  probabilities 
character  transition  probabilities 
lexical  context 

cVo  0  /i-  V  i  D 


eiwo— n— e 
cl^O  /m— P 
d 'u/  0  0 
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Character  and  Lexical  Context 

Use  the  Viterbi  algorithm  [Viterbi67,  Forney73] 
to  select  the  character  string  with  maximum  a 
posteriori  probability 

Let  maximization  of  word  probability  drive 
proper  segmentation  [Bozinovic82] 


Problem:  VA  can  produce  lexically  incorrect 
strings.  Post-processing  with  a  dictionary  can 
produce  word  which  is  not  MAP. 

Solution:  Use  lexical  context  to  trim  paths 
from  VA  search  ensuring  that  the  final  string  is 
both  MAP  and  lexically  correct  [Srihari83]. 
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Application  Context  Processing 


Some  applications  are  alphanumeric  but  not 
part  of  lexicons: 


-  Inter-character  statistics  are  specialized,  hard  to  learn 
without  large  set 


User-definable  syntax  selects  which  subnet  to 
use  for  each  character  position,  trims 
segmentation  alternatives  to  match  syntax 


ZIP  Code 


o 
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Status  and  the  Future 


•  This  architecture  is  the  basis  for  the  product 
NestorReader 

•  To  be  supported  on  Nil  000  neural  net  chip 

•  Extensible  to  character-based  cursive 
recognition 

-  Now  developing  much  larger  training  and  test  sets  for  run 
-on  HP  and  cursive 

-  Developing  stats,  on  character  and  segmentation 
accuracy  due  to  character  and  lexical  context 
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•  H.H.Thodberg,  “A  Review  of  Bayesian  Neural 
Networks  with  an  Application  to  Near  Infrared 
Spectroscopy”  and  “A  Bayesian  Approach  to 
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